52 research outputs found

    Smart Trip Alternatives for the Curious

    Get PDF
    International audienceWhen searching for flights, current systems often suggest routesinvolving waiting times at stopovers. There might exist alternative routes which aremore attractive from a touristic perspective because their duration isnot necessarily much longer while offering enough time in anappropriate place. Choosing among suchalternatives requires additional planning efforts to make sure thate.g. points of interest can conveniently be reached in theallowed time frame. We present a system that automatically computes smart tripalternatives between any two cities. To do so, it searchespoints of interest in large semantic datasets considering theset of accessible areas around each possible layover. It then elects feasible alternatives and displays theirdifferences with respect to the default trip

    SPARQLGX in Action: Efficient Distributed Evaluation of SPARQL with Apache Spark

    Get PDF
    International audienceWe demonstrate SPARQLGX: our implementation of a distributed sparql evaluator. We show that sparqlgx makes it possible to evaluate SPARQL queries on billions of triples distributed across multiple nodes, while providing attractive performance figures

    Federated Query Processing

    Get PDF
    Big data plays a relevant role in promoting both manufacturing and scientific development through industrial digitization and emerging interdisciplinary research. Semantic web technologies have also experienced great progress, and scientific communities and practitioners have contributed to the problem of big data management with ontological models, controlled vocabularies, linked datasets, data models, query languages, as well as tools for transforming big data into knowledge from which decisions can be made. Despite the significant impact of big data and semantic web technologies, we are entering into a new era where domains like genomics are projected to grow very rapidly in the next decade. In this next era, integrating big data demands novel and scalable tools for enabling not only big data ingestion and curation but also efficient large-scale exploration and discovery. Federated query processing techniques provide a solution to scale up to large volumes of data distributed across multiple data sources. Federated query processing techniques resort to source descriptions to identify relevant data sources for a query, as well as to find efficient execution plans that minimize the total execution time of a query and maximize the completeness of the answers. This chapter summarizes the main characteristics of a federated query engine, reviews the current state of the field, and outlines the problems that still remain open and represent grand challenges for the area

    Involvement of OpenStreetMap in European H2020 Projects

    Get PDF
    International audienc

    Context-Based Entity Matching for Big Data

    Get PDF
    In the Big Data era, where variety is the most dominant dimension, the RDF data model enables the creation and integration of actionable knowledge from heterogeneous data sources. However, the RDF data model allows for describing entities under various contexts, e.g., people can be described from its demographic context, but as well from their professional contexts. Context-aware description poses challenges during entity matching of RDF datasets—the match might not be valid in every context. To perform a contextually relevant entity matching, the specific context under which a data-driven task, e.g., data integration is performed, must be taken into account. However, existing approaches only consider inter-schema and properties mapping of different data sources and prevent users from selecting contexts and conditions during a data integration process. We devise COMET, an entity matching technique that relies on both the knowledge stated in RDF vocabularies and a context-based similarity metric to map contextually equivalent RDF graphs. COMET follows a two-fold approach to solve the problem of entity matching in RDF graphs in a context-aware manner. In the first step, COMET computes the similarity measures across RDF entities and resorts to the Formal Concept Analysis algorithm to map contextually equivalent RDF entities. Finally, COMET combines the results of the first step and executes a 1-1 perfect matching algorithm for matching RDF entities based on the combined scores. We empirically evaluate the performance of COMET on testbed from DBpedia. The experimental results suggest that COMET accurately matches equivalent RDF graphs in a context-dependent manner

    Multi-Level Visual Tours of Weather Linked Data

    Get PDF
    International audienceThe recent trend of adopting linked-data principles to integrate and publish semantically described open data using W3C standards has led to a large amount of available resources. In particular, meteorological sensor data have been uplifted into public weather-focused RDF graphs, such as WeKG-MF which offers access to a large set of meteorological variables described through spatial and temporal dimensions. Nevertheless, these resources include huge numbers of raw observations that are tedious to explore by lay users. In this article, we aim at providing them with visual exploratory "tours", benefiting from RDF data cubes to present high-level aggregated views together with on-demand fine-grained details through a unified Web interface

    De-icing federated SPARQL pipelines: a method for assessing the "freshness" of result sets

    Get PDF
    International audienceIn recent years, the ever-increasing number of available linkeddata endpoints has allowed the creation of complex data pipelines leveraging these massive amounts of information. One crucial challenge for federated pipeline designers is to know when to query the various sources they use in order to obtain fresher final results. In other words, they want to know when a data update on a specific source impacts their own final results. Unfortunately, the SPARQL standard does not provide them with a method to be aware of such updates; and therefore pipelines are regularly relaunched from scratch, often uselessly. To help them decide when to get fresher results, we propose a constructive method. Practically, it relies on digitally signing result sets from federated endpoints in order to create a specific query able to warn when, and explain why, the pipeline result set is outdated. In addition, as our solution is exclusively based on SPARQL 1.1 built-in functions, it is fully-compliant with all the endpoints

    Hash-ssessing the freshness of SPARQL pipelines

    Get PDF
    International audienceThe recent increase of RDF usage has witnessed a rising need of "verification" around data obtained from SPARQL endpoints. It is now possible to deploy Semantic Web pipelines and to adapt them to a wide range of needs and use-cases. Practically, these complex ETL pipelines relying on SPARQL endpoints to extract relevant information often have to be relaunched from scratch every once in a while in order to refresh their data. Such a habit adds load on the network and is heavy resource-wise, while sometimes unnecessary if data remains untouched. In this article, we present a useful method to help data consumers (and pipeline designers) identify when data has been updated in a way that impacts the pipeline's result set. This method is based on standard SPARQL 1.1 features and relies on digitally signing parts of query result sets to inform data consumers about their eventual change
    • …
    corecore